Overview

Brought to you by YData

Dataset statistics

Number of variables18
Number of observations426880
Missing cells1215152
Missing cells (%)15.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory58.6 MiB
Average record size in memory144.0 B

Variable types

Numeric4
Text4
Categorical10

Alerts

drive is highly overall correlated with typeHigh correlation
odometer is highly overall correlated with yearHigh correlation
type is highly overall correlated with driveHigh correlation
year is highly overall correlated with odometerHigh correlation
fuel is highly imbalanced (62.7%) Imbalance
title_status is highly imbalanced (89.9%) Imbalance
manufacturer has 17646 (4.1%) missing values Missing
model has 5277 (1.2%) missing values Missing
condition has 174104 (40.8%) missing values Missing
cylinders has 177678 (41.6%) missing values Missing
odometer has 4400 (1.0%) missing values Missing
title_status has 8242 (1.9%) missing values Missing
VIN has 161042 (37.7%) missing values Missing
drive has 130567 (30.6%) missing values Missing
size has 306361 (71.8%) missing values Missing
type has 92858 (21.8%) missing values Missing
paint_color has 130203 (30.5%) missing values Missing
price is highly skewed (γ1 = 254.4069323) Skewed
odometer is highly skewed (γ1 = 38.04001486) Skewed
id has unique values Unique
price has 32895 (7.7%) zeros Zeros

Reproduction

Analysis started2025-04-20 03:53:05.951485
Analysis finished2025-04-20 03:53:18.424344
Duration12.47 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

id
Real number (ℝ)

Unique 

Distinct426880
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.3114866 × 109
Minimum7.2074081 × 109
Maximum7.3171011 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2025-04-19T20:53:18.525792image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum7.2074081 × 109
5-th percentile7.3031501 × 109
Q17.3081433 × 109
median7.3126208 × 109
Q37.3152535 × 109
95-th percentile7.3167433 × 109
Maximum7.3171011 × 109
Range1.0969296 × 108
Interquartile range (IQR)7110204.2

Descriptive statistics

Standard deviation4473170.4
Coefficient of variation (CV)0.0006118004
Kurtosis17.057761
Mean7.3114866 × 109
Median Absolute Deviation (MAD)3096588
Skewness-1.4301233
Sum3.1211274 × 1015
Variance2.0009254 × 1013
MonotonicityNot monotonic
2025-04-19T20:53:18.588429image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7222695916 1
 
< 0.1%
7313139418 1
 
< 0.1%
7313423023 1
 
< 0.1%
7313423324 1
 
< 0.1%
7313424533 1
 
< 0.1%
7313425823 1
 
< 0.1%
7313426990 1
 
< 0.1%
7313427132 1
 
< 0.1%
7313426423 1
 
< 0.1%
7313426503 1
 
< 0.1%
Other values (426870) 426870
> 99.9%
ValueCountFrequency (%)
7207408119 1
< 0.1%
7208549803 1
< 0.1%
7209027818 1
< 0.1%
7209054699 1
< 0.1%
7209064557 1
< 0.1%
7210384030 1
< 0.1%
7212512589 1
< 0.1%
7212631321 1
< 0.1%
7213839225 1
< 0.1%
7213843538 1
< 0.1%
ValueCountFrequency (%)
7317101084 1
< 0.1%
7317098990 1
< 0.1%
7317098055 1
< 0.1%
7317096748 1
< 0.1%
7317096685 1
< 0.1%
7317096571 1
< 0.1%
7317096373 1
< 0.1%
7317096141 1
< 0.1%
7317096101 1
< 0.1%
7317096069 1
< 0.1%

region
Text

Distinct404
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
2025-04-19T20:53:18.748096image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length26
Median length20
Mean length11.44423
Min length4

Characters and Unicode

Total characters4885313
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowprescott
2nd rowfayetteville
3rd rowflorida keys
4th rowworcester / central MA
5th rowgreensboro
ValueCountFrequency (%)
64305
 
8.6%
city 12302
 
1.6%
new 9171
 
1.2%
bay 8365
 
1.1%
st 7915
 
1.1%
san 7639
 
1.0%
south 7598
 
1.0%
county 6893
 
0.9%
jersey 6781
 
0.9%
fort 6553
 
0.9%
Other values (491) 610100
81.6%
2025-04-19T20:53:18.958900image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 476629
 
9.8%
e 409823
 
8.4%
o 365776
 
7.5%
n 348420
 
7.1%
320742
 
6.6%
s 315838
 
6.5%
l 303305
 
6.2%
t 284497
 
5.8%
r 283439
 
5.8%
i 276202
 
5.7%
Other values (45) 1500642
30.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4885313
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 476629
 
9.8%
e 409823
 
8.4%
o 365776
 
7.5%
n 348420
 
7.1%
320742
 
6.6%
s 315838
 
6.5%
l 303305
 
6.2%
t 284497
 
5.8%
r 283439
 
5.8%
i 276202
 
5.7%
Other values (45) 1500642
30.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4885313
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 476629
 
9.8%
e 409823
 
8.4%
o 365776
 
7.5%
n 348420
 
7.1%
320742
 
6.6%
s 315838
 
6.5%
l 303305
 
6.2%
t 284497
 
5.8%
r 283439
 
5.8%
i 276202
 
5.7%
Other values (45) 1500642
30.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4885313
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 476629
 
9.8%
e 409823
 
8.4%
o 365776
 
7.5%
n 348420
 
7.1%
320742
 
6.6%
s 315838
 
6.5%
l 303305
 
6.2%
t 284497
 
5.8%
r 283439
 
5.8%
i 276202
 
5.7%
Other values (45) 1500642
30.7%

price
Real number (ℝ)

Skewed  Zeros 

Distinct15655
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75199.033
Minimum0
Maximum3.7369287 × 109
Zeros32895
Zeros (%)7.7%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2025-04-19T20:53:19.017491image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15900
median13950
Q326485.75
95-th percentile44500
Maximum3.7369287 × 109
Range3.7369287 × 109
Interquartile range (IQR)20585.75

Descriptive statistics

Standard deviation12182282
Coefficient of variation (CV)162.00052
Kurtosis69205.089
Mean75199.033
Median Absolute Deviation (MAD)9450
Skewness254.40693
Sum3.2100963 × 1010
Variance1.48408 × 1014
MonotonicityNot monotonic
2025-04-19T20:53:19.079649image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 32895
 
7.7%
6995 3169
 
0.7%
7995 3129
 
0.7%
9995 2867
 
0.7%
8995 2837
 
0.7%
4500 2778
 
0.7%
5995 2727
 
0.6%
3500 2716
 
0.6%
29990 2705
 
0.6%
6500 2594
 
0.6%
Other values (15645) 368463
86.3%
ValueCountFrequency (%)
0 32895
7.7%
1 1951
 
0.5%
2 13
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
5 16
 
< 0.1%
6 12
 
< 0.1%
7 8
 
< 0.1%
8 7
 
< 0.1%
9 14
 
< 0.1%
ValueCountFrequency (%)
3736928711 2
 
< 0.1%
3024942282 2
 
< 0.1%
3009548743 1
 
< 0.1%
1410065407 1
 
< 0.1%
1234567890 1
 
< 0.1%
1111111111 2
 
< 0.1%
987654321 2
 
< 0.1%
135008900 1
 
< 0.1%
123456789 6
< 0.1%
113456789 1
 
< 0.1%

year
Real number (ℝ)

High correlation 

Distinct114
Distinct (%)< 0.1%
Missing1205
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean2011.2352
Minimum1900
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2025-04-19T20:53:19.144070image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile1998
Q12008
median2013
Q32017
95-th percentile2020
Maximum2022
Range122
Interquartile range (IQR)9

Descriptive statistics

Standard deviation9.4521196
Coefficient of variation (CV)0.004699659
Kurtosis19.579889
Mean2011.2352
Median Absolute Deviation (MAD)4
Skewness-3.5779204
Sum8.5613254 × 108
Variance89.342565
MonotonicityNot monotonic
2025-04-19T20:53:19.207641image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2017 36420
 
8.5%
2018 36369
 
8.5%
2015 31538
 
7.4%
2013 30794
 
7.2%
2016 30434
 
7.1%
2014 30283
 
7.1%
2019 25375
 
5.9%
2012 23898
 
5.6%
2011 20341
 
4.8%
2020 19298
 
4.5%
Other values (104) 140925
33.0%
ValueCountFrequency (%)
1900 12
< 0.1%
1901 3
 
< 0.1%
1902 1
 
< 0.1%
1903 12
< 0.1%
1905 1
 
< 0.1%
1909 1
 
< 0.1%
1910 2
 
< 0.1%
1913 2
 
< 0.1%
1915 1
 
< 0.1%
1916 2
 
< 0.1%
ValueCountFrequency (%)
2022 133
 
< 0.1%
2021 2396
 
0.6%
2020 19298
4.5%
2019 25375
5.9%
2018 36369
8.5%
2017 36420
8.5%
2016 30434
7.1%
2015 31538
7.4%
2014 30283
7.1%
2013 30794
7.2%

manufacturer
Categorical

Missing 

Distinct42
Distinct (%)< 0.1%
Missing17646
Missing (%)4.1%
Memory size3.3 MiB
ford
70985 
chevrolet
55064 
toyota
34202 
honda
21269 
nissan
 
19067
Other values (37)
208647 

Length

Max length15
Median length12
Mean length5.7946578
Min length3

Characters and Unicode

Total characters2371371
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgmc
2nd rowchevrolet
3rd rowchevrolet
4th rowtoyota
5th rowford

Common Values

ValueCountFrequency (%)
ford 70985
16.6%
chevrolet 55064
12.9%
toyota 34202
 
8.0%
honda 21269
 
5.0%
nissan 19067
 
4.5%
jeep 19014
 
4.5%
ram 18342
 
4.3%
gmc 16785
 
3.9%
bmw 14699
 
3.4%
dodge 13707
 
3.2%
Other values (32) 126100
29.5%
(Missing) 17646
 
4.1%

Length

2025-04-19T20:53:19.268019image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ford 70985
17.3%
chevrolet 55064
13.5%
toyota 34202
 
8.4%
honda 21269
 
5.2%
nissan 19067
 
4.7%
jeep 19014
 
4.6%
ram 18342
 
4.5%
gmc 16785
 
4.1%
bmw 14699
 
3.6%
dodge 13707
 
3.3%
Other values (32) 126121
30.8%

Most occurring characters

ValueCountFrequency (%)
o 257522
 
10.9%
e 239422
 
10.1%
r 196161
 
8.3%
a 186064
 
7.8%
d 162166
 
6.8%
t 136711
 
5.8%
c 124158
 
5.2%
n 114989
 
4.8%
l 106299
 
4.5%
i 99297
 
4.2%
Other values (17) 748582
31.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2371371
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 257522
 
10.9%
e 239422
 
10.1%
r 196161
 
8.3%
a 186064
 
7.8%
d 162166
 
6.8%
t 136711
 
5.8%
c 124158
 
5.2%
n 114989
 
4.8%
l 106299
 
4.5%
i 99297
 
4.2%
Other values (17) 748582
31.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2371371
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 257522
 
10.9%
e 239422
 
10.1%
r 196161
 
8.3%
a 186064
 
7.8%
d 162166
 
6.8%
t 136711
 
5.8%
c 124158
 
5.2%
n 114989
 
4.8%
l 106299
 
4.5%
i 99297
 
4.2%
Other values (17) 748582
31.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2371371
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 257522
 
10.9%
e 239422
 
10.1%
r 196161
 
8.3%
a 186064
 
7.8%
d 162166
 
6.8%
t 136711
 
5.8%
c 124158
 
5.2%
n 114989
 
4.8%
l 106299
 
4.5%
i 99297
 
4.2%
Other values (17) 748582
31.6%

model
Text

Missing 

Distinct29649
Distinct (%)7.0%
Missing5277
Missing (%)1.2%
Memory size3.3 MiB
2025-04-19T20:53:19.449319image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length203
Median length177
Mean length11.91973
Min length1

Characters and Unicode

Total characters5025394
Distinct characters117
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15290 ?
Unique (%)3.6%

Sample

1st rowsierra 1500 crew cab slt
2nd rowsilverado 1500
3rd rowsilverado 1500 crew
4th rowtundra double cab sr
5th rowf-150 xlt
ValueCountFrequency (%)
1500 24082
 
2.6%
sport 23261
 
2.6%
4d 18645
 
2.1%
silverado 17396
 
1.9%
sedan 15508
 
1.7%
cab 15224
 
1.7%
f-150 10417
 
1.1%
4x4 9664
 
1.1%
grand 8913
 
1.0%
sierra 8703
 
1.0%
Other values (8692) 757366
83.3%
2025-04-19T20:53:19.716846image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
487722
 
9.7%
e 395746
 
7.9%
a 371389
 
7.4%
r 366159
 
7.3%
s 278320
 
5.5%
t 259161
 
5.2%
i 237544
 
4.7%
o 228379
 
4.5%
l 212188
 
4.2%
c 206496
 
4.1%
Other values (107) 1982290
39.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5025394
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
487722
 
9.7%
e 395746
 
7.9%
a 371389
 
7.4%
r 366159
 
7.3%
s 278320
 
5.5%
t 259161
 
5.2%
i 237544
 
4.7%
o 228379
 
4.5%
l 212188
 
4.2%
c 206496
 
4.1%
Other values (107) 1982290
39.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5025394
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
487722
 
9.7%
e 395746
 
7.9%
a 371389
 
7.4%
r 366159
 
7.3%
s 278320
 
5.5%
t 259161
 
5.2%
i 237544
 
4.7%
o 228379
 
4.5%
l 212188
 
4.2%
c 206496
 
4.1%
Other values (107) 1982290
39.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5025394
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
487722
 
9.7%
e 395746
 
7.9%
a 371389
 
7.4%
r 366159
 
7.3%
s 278320
 
5.5%
t 259161
 
5.2%
i 237544
 
4.7%
o 228379
 
4.5%
l 212188
 
4.2%
c 206496
 
4.1%
Other values (107) 1982290
39.4%

condition
Categorical

Missing 

Distinct6
Distinct (%)< 0.1%
Missing174104
Missing (%)40.8%
Memory size3.3 MiB
good
121456 
excellent
101467 
like new
21178 
fair
 
6769
new
 
1305

Length

Max length9
Median length4
Mean length6.3441506
Min length3

Characters and Unicode

Total characters1603649
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowexcellent

Common Values

ValueCountFrequency (%)
good 121456
28.5%
excellent 101467
23.8%
like new 21178
 
5.0%
fair 6769
 
1.6%
new 1305
 
0.3%
salvage 601
 
0.1%
(Missing) 174104
40.8%

Length

2025-04-19T20:53:19.774722image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:19.827587image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
good 121456
44.3%
excellent 101467
37.0%
new 22483
 
8.2%
like 21178
 
7.7%
fair 6769
 
2.5%
salvage 601
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e 348663
21.7%
o 242912
15.1%
l 224713
14.0%
n 123950
 
7.7%
g 122057
 
7.6%
d 121456
 
7.6%
x 101467
 
6.3%
c 101467
 
6.3%
t 101467
 
6.3%
i 27947
 
1.7%
Other values (8) 87550
 
5.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1603649
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 348663
21.7%
o 242912
15.1%
l 224713
14.0%
n 123950
 
7.7%
g 122057
 
7.6%
d 121456
 
7.6%
x 101467
 
6.3%
c 101467
 
6.3%
t 101467
 
6.3%
i 27947
 
1.7%
Other values (8) 87550
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1603649
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 348663
21.7%
o 242912
15.1%
l 224713
14.0%
n 123950
 
7.7%
g 122057
 
7.6%
d 121456
 
7.6%
x 101467
 
6.3%
c 101467
 
6.3%
t 101467
 
6.3%
i 27947
 
1.7%
Other values (8) 87550
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1603649
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 348663
21.7%
o 242912
15.1%
l 224713
14.0%
n 123950
 
7.7%
g 122057
 
7.6%
d 121456
 
7.6%
x 101467
 
6.3%
c 101467
 
6.3%
t 101467
 
6.3%
i 27947
 
1.7%
Other values (8) 87550
 
5.5%

cylinders
Categorical

Missing 

Distinct8
Distinct (%)< 0.1%
Missing177678
Missing (%)41.6%
Memory size3.3 MiB
6 cylinders
94169 
4 cylinders
77642 
8 cylinders
72062 
5 cylinders
 
1712
10 cylinders
 
1455
Other values (3)
 
2162

Length

Max length12
Median length11
Mean length10.975426
Min length5

Characters and Unicode

Total characters2735098
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8 cylinders
2nd row8 cylinders
3rd row8 cylinders
4th row8 cylinders
5th row6 cylinders

Common Values

ValueCountFrequency (%)
6 cylinders 94169
22.1%
4 cylinders 77642
18.2%
8 cylinders 72062
16.9%
5 cylinders 1712
 
0.4%
10 cylinders 1455
 
0.3%
other 1298
 
0.3%
3 cylinders 655
 
0.2%
12 cylinders 209
 
< 0.1%
(Missing) 177678
41.6%

Length

2025-04-19T20:53:19.884040image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:19.933220image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
cylinders 247904
49.9%
6 94169
 
18.9%
4 77642
 
15.6%
8 72062
 
14.5%
5 1712
 
0.3%
10 1455
 
0.3%
other 1298
 
0.3%
3 655
 
0.1%
12 209
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 249202
9.1%
r 249202
9.1%
s 247904
9.1%
247904
9.1%
c 247904
9.1%
y 247904
9.1%
l 247904
9.1%
i 247904
9.1%
n 247904
9.1%
d 247904
9.1%
Other values (11) 253462
9.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2735098
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 249202
9.1%
r 249202
9.1%
s 247904
9.1%
247904
9.1%
c 247904
9.1%
y 247904
9.1%
l 247904
9.1%
i 247904
9.1%
n 247904
9.1%
d 247904
9.1%
Other values (11) 253462
9.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2735098
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 249202
9.1%
r 249202
9.1%
s 247904
9.1%
247904
9.1%
c 247904
9.1%
y 247904
9.1%
l 247904
9.1%
i 247904
9.1%
n 247904
9.1%
d 247904
9.1%
Other values (11) 253462
9.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2735098
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 249202
9.1%
r 249202
9.1%
s 247904
9.1%
247904
9.1%
c 247904
9.1%
y 247904
9.1%
l 247904
9.1%
i 247904
9.1%
n 247904
9.1%
d 247904
9.1%
Other values (11) 253462
9.3%

fuel
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing3013
Missing (%)0.7%
Memory size3.3 MiB
gas
356209 
other
 
30728
diesel
 
30062
hybrid
 
5170
electric
 
1698

Length

Max length8
Median length3
Mean length3.41438
Min length3

Characters and Unicode

Total characters1447243
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgas
2nd rowgas
3rd rowgas
4th rowgas
5th rowgas

Common Values

ValueCountFrequency (%)
gas 356209
83.4%
other 30728
 
7.2%
diesel 30062
 
7.0%
hybrid 5170
 
1.2%
electric 1698
 
0.4%
(Missing) 3013
 
0.7%

Length

2025-04-19T20:53:19.990759image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:20.039296image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
gas 356209
84.0%
other 30728
 
7.2%
diesel 30062
 
7.1%
hybrid 5170
 
1.2%
electric 1698
 
0.4%

Most occurring characters

ValueCountFrequency (%)
s 386271
26.7%
g 356209
24.6%
a 356209
24.6%
e 94248
 
6.5%
r 37596
 
2.6%
i 36930
 
2.6%
h 35898
 
2.5%
d 35232
 
2.4%
t 32426
 
2.2%
l 31760
 
2.2%
Other values (4) 44464
 
3.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1447243
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 386271
26.7%
g 356209
24.6%
a 356209
24.6%
e 94248
 
6.5%
r 37596
 
2.6%
i 36930
 
2.6%
h 35898
 
2.5%
d 35232
 
2.4%
t 32426
 
2.2%
l 31760
 
2.2%
Other values (4) 44464
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1447243
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 386271
26.7%
g 356209
24.6%
a 356209
24.6%
e 94248
 
6.5%
r 37596
 
2.6%
i 36930
 
2.6%
h 35898
 
2.5%
d 35232
 
2.4%
t 32426
 
2.2%
l 31760
 
2.2%
Other values (4) 44464
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1447243
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 386271
26.7%
g 356209
24.6%
a 356209
24.6%
e 94248
 
6.5%
r 37596
 
2.6%
i 36930
 
2.6%
h 35898
 
2.5%
d 35232
 
2.4%
t 32426
 
2.2%
l 31760
 
2.2%
Other values (4) 44464
 
3.1%

odometer
Real number (ℝ)

High correlation  Missing  Skewed 

Distinct104870
Distinct (%)24.8%
Missing4400
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean98043.331
Minimum0
Maximum10000000
Zeros1965
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size3.3 MiB
2025-04-19T20:53:20.097453image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6318
Q137704
median85548
Q3133542.5
95-th percentile204000
Maximum10000000
Range10000000
Interquartile range (IQR)95838.5

Descriptive statistics

Standard deviation213881.5
Coefficient of variation (CV)2.1814997
Kurtosis1690.7574
Mean98043.331
Median Absolute Deviation (MAD)47910.5
Skewness38.040015
Sum4.1421347 × 1010
Variance4.5745296 × 1010
MonotonicityNot monotonic
2025-04-19T20:53:20.161500image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000 2263
 
0.5%
1 2246
 
0.5%
0 1965
 
0.5%
200000 1728
 
0.4%
150000 1603
 
0.4%
160000 1250
 
0.3%
140000 1244
 
0.3%
130000 1204
 
0.3%
120000 1199
 
0.3%
180000 1062
 
0.2%
Other values (104860) 406716
95.3%
(Missing) 4400
 
1.0%
ValueCountFrequency (%)
0 1965
0.5%
1 2246
0.5%
2 153
 
< 0.1%
3 58
 
< 0.1%
4 138
 
< 0.1%
5 193
 
< 0.1%
6 33
 
< 0.1%
7 69
 
< 0.1%
8 37
 
< 0.1%
9 38
 
< 0.1%
ValueCountFrequency (%)
10000000 50
< 0.1%
9999999 88
< 0.1%
9876543 1
 
< 0.1%
9750924 1
 
< 0.1%
9099999 1
 
< 0.1%
9000000 3
 
< 0.1%
8888888 4
 
< 0.1%
8765548 1
 
< 0.1%
8675309 1
 
< 0.1%
8393929 1
 
< 0.1%

title_status
Categorical

Imbalance  Missing 

Distinct6
Distinct (%)< 0.1%
Missing8242
Missing (%)1.9%
Memory size3.3 MiB
clean
405117 
rebuilt
 
7219
salvage
 
3868
lien
 
1422
missing
 
814

Length

Max length10
Median length5
Mean length5.0558239
Min length4

Characters and Unicode

Total characters2116560
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowclean
2nd rowclean
3rd rowclean
4th rowclean
5th rowclean

Common Values

ValueCountFrequency (%)
clean 405117
94.9%
rebuilt 7219
 
1.7%
salvage 3868
 
0.9%
lien 1422
 
0.3%
missing 814
 
0.2%
parts only 198
 
< 0.1%
(Missing) 8242
 
1.9%

Length

2025-04-19T20:53:20.281085image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:20.330161image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
clean 405117
96.7%
rebuilt 7219
 
1.7%
salvage 3868
 
0.9%
lien 1422
 
0.3%
missing 814
 
0.2%
parts 198
 
< 0.1%
only 198
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
l 417824
19.7%
e 417626
19.7%
a 413051
19.5%
n 407551
19.3%
c 405117
19.1%
i 10269
 
0.5%
t 7417
 
0.4%
r 7417
 
0.4%
u 7219
 
0.3%
b 7219
 
0.3%
Other values (8) 15850
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2116560
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 417824
19.7%
e 417626
19.7%
a 413051
19.5%
n 407551
19.3%
c 405117
19.1%
i 10269
 
0.5%
t 7417
 
0.4%
r 7417
 
0.4%
u 7219
 
0.3%
b 7219
 
0.3%
Other values (8) 15850
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2116560
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 417824
19.7%
e 417626
19.7%
a 413051
19.5%
n 407551
19.3%
c 405117
19.1%
i 10269
 
0.5%
t 7417
 
0.4%
r 7417
 
0.4%
u 7219
 
0.3%
b 7219
 
0.3%
Other values (8) 15850
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2116560
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 417824
19.7%
e 417626
19.7%
a 413051
19.5%
n 407551
19.3%
c 405117
19.1%
i 10269
 
0.5%
t 7417
 
0.4%
r 7417
 
0.4%
u 7219
 
0.3%
b 7219
 
0.3%
Other values (8) 15850
 
0.7%

transmission
Categorical

Distinct3
Distinct (%)< 0.1%
Missing2556
Missing (%)0.6%
Memory size3.3 MiB
automatic
336524 
other
62682 
manual
 
25118

Length

Max length9
Median length9
Mean length8.2315259
Min length5

Characters and Unicode

Total characters3492834
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowother
2nd rowother
3rd rowother
4th rowother
5th rowautomatic

Common Values

ValueCountFrequency (%)
automatic 336524
78.8%
other 62682
 
14.7%
manual 25118
 
5.9%
(Missing) 2556
 
0.6%

Length

2025-04-19T20:53:20.389746image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:20.440097image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
automatic 336524
79.3%
other 62682
 
14.8%
manual 25118
 
5.9%

Most occurring characters

ValueCountFrequency (%)
t 735730
21.1%
a 723284
20.7%
o 399206
11.4%
u 361642
10.4%
m 361642
10.4%
i 336524
9.6%
c 336524
9.6%
h 62682
 
1.8%
e 62682
 
1.8%
r 62682
 
1.8%
Other values (2) 50236
 
1.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3492834
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 735730
21.1%
a 723284
20.7%
o 399206
11.4%
u 361642
10.4%
m 361642
10.4%
i 336524
9.6%
c 336524
9.6%
h 62682
 
1.8%
e 62682
 
1.8%
r 62682
 
1.8%
Other values (2) 50236
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3492834
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 735730
21.1%
a 723284
20.7%
o 399206
11.4%
u 361642
10.4%
m 361642
10.4%
i 336524
9.6%
c 336524
9.6%
h 62682
 
1.8%
e 62682
 
1.8%
r 62682
 
1.8%
Other values (2) 50236
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3492834
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 735730
21.1%
a 723284
20.7%
o 399206
11.4%
u 361642
10.4%
m 361642
10.4%
i 336524
9.6%
c 336524
9.6%
h 62682
 
1.8%
e 62682
 
1.8%
r 62682
 
1.8%
Other values (2) 50236
 
1.4%

VIN
Text

Missing 

Distinct118246
Distinct (%)44.5%
Missing161042
Missing (%)37.7%
Memory size3.3 MiB
2025-04-19T20:53:20.578692image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length17
Mean length16.958757
Min length1

Characters and Unicode

Total characters4508282
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique77966 ?
Unique (%)29.3%

Sample

1st row3GTP1VEC4EG551563
2nd row1GCSCSE06AZ123805
3rd row3GCPWCED5LG130317
4th row5TFRM5F17HX120972
5th row1GT220CG8CZ231238
ValueCountFrequency (%)
1fmju1jt1hea52352 261
 
0.1%
3c6jr6dt3kg560649 235
 
0.1%
1fter1eh1lla36301 231
 
0.1%
5tftx4cn3ex042751 227
 
0.1%
1gchtce37g1186784 214
 
0.1%
1gtn1teh5ez273019 207
 
0.1%
3vwf17at1fm655022 199
 
0.1%
jn1az4eh8km420880 198
 
0.1%
1ftmf1cp3gkd62143 195
 
0.1%
1gtr1we07dz143724 194
 
0.1%
Other values (118236) 263677
99.2%
2025-04-19T20:53:20.783100image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 425101
 
9.4%
2 279570
 
6.2%
3 273707
 
6.1%
5 273040
 
6.1%
4 241769
 
5.4%
0 241757
 
5.4%
6 223745
 
5.0%
7 209685
 
4.7%
8 198070
 
4.4%
9 172062
 
3.8%
Other values (28) 1969776
43.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4508282
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 425101
 
9.4%
2 279570
 
6.2%
3 273707
 
6.1%
5 273040
 
6.1%
4 241769
 
5.4%
0 241757
 
5.4%
6 223745
 
5.0%
7 209685
 
4.7%
8 198070
 
4.4%
9 172062
 
3.8%
Other values (28) 1969776
43.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4508282
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 425101
 
9.4%
2 279570
 
6.2%
3 273707
 
6.1%
5 273040
 
6.1%
4 241769
 
5.4%
0 241757
 
5.4%
6 223745
 
5.0%
7 209685
 
4.7%
8 198070
 
4.4%
9 172062
 
3.8%
Other values (28) 1969776
43.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4508282
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 425101
 
9.4%
2 279570
 
6.2%
3 273707
 
6.1%
5 273040
 
6.1%
4 241769
 
5.4%
0 241757
 
5.4%
6 223745
 
5.0%
7 209685
 
4.7%
8 198070
 
4.4%
9 172062
 
3.8%
Other values (28) 1969776
43.7%

drive
Categorical

High correlation  Missing 

Distinct3
Distinct (%)< 0.1%
Missing130567
Missing (%)30.6%
Memory size3.3 MiB
4wd
131904 
fwd
105517 
rwd
58892 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters888939
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowrwd
2nd row4wd
3rd row4wd
4th row4wd
5th row4wd

Common Values

ValueCountFrequency (%)
4wd 131904
30.9%
fwd 105517
24.7%
rwd 58892
13.8%
(Missing) 130567
30.6%

Length

2025-04-19T20:53:20.836744image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:20.878460image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
4wd 131904
44.5%
fwd 105517
35.6%
rwd 58892
19.9%

Most occurring characters

ValueCountFrequency (%)
w 296313
33.3%
d 296313
33.3%
4 131904
14.8%
f 105517
 
11.9%
r 58892
 
6.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 888939
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 296313
33.3%
d 296313
33.3%
4 131904
14.8%
f 105517
 
11.9%
r 58892
 
6.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 888939
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 296313
33.3%
d 296313
33.3%
4 131904
14.8%
f 105517
 
11.9%
r 58892
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 888939
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 296313
33.3%
d 296313
33.3%
4 131904
14.8%
f 105517
 
11.9%
r 58892
 
6.6%

size
Categorical

Missing 

Distinct4
Distinct (%)< 0.1%
Missing306361
Missing (%)71.8%
Memory size3.3 MiB
full-size
63465 
mid-size
34476 
compact
19384 
sub-compact
 
3194

Length

Max length11
Median length9
Mean length8.4452659
Min length7

Characters and Unicode

Total characters1017815
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfull-size
2nd rowfull-size
3rd rowfull-size
4th rowfull-size
5th rowfull-size

Common Values

ValueCountFrequency (%)
full-size 63465
 
14.9%
mid-size 34476
 
8.1%
compact 19384
 
4.5%
sub-compact 3194
 
0.7%
(Missing) 306361
71.8%

Length

2025-04-19T20:53:20.930886image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-19T20:53:20.980549image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
full-size 63465
52.7%
mid-size 34476
28.6%
compact 19384
 
16.1%
sub-compact 3194
 
2.7%

Most occurring characters

ValueCountFrequency (%)
i 132417
13.0%
l 126930
12.5%
- 101135
9.9%
s 101135
9.9%
z 97941
9.6%
e 97941
9.6%
u 66659
6.5%
f 63465
6.2%
m 57054
5.6%
c 45156
 
4.4%
Other values (6) 127982
12.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1017815
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 132417
13.0%
l 126930
12.5%
- 101135
9.9%
s 101135
9.9%
z 97941
9.6%
e 97941
9.6%
u 66659
6.5%
f 63465
6.2%
m 57054
5.6%
c 45156
 
4.4%
Other values (6) 127982
12.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1017815
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 132417
13.0%
l 126930
12.5%
- 101135
9.9%
s 101135
9.9%
z 97941
9.6%
e 97941
9.6%
u 66659
6.5%
f 63465
6.2%
m 57054
5.6%
c 45156
 
4.4%
Other values (6) 127982
12.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1017815
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 132417
13.0%
l 126930
12.5%
- 101135
9.9%
s 101135
9.9%
z 97941
9.6%
e 97941
9.6%
u 66659
6.5%
f 63465
6.2%
m 57054
5.6%
c 45156
 
4.4%
Other values (6) 127982
12.6%

type
Categorical

High correlation  Missing 

Distinct13
Distinct (%)< 0.1%
Missing92858
Missing (%)21.8%
Memory size3.3 MiB
sedan
87056 
SUV
77284 
pickup
43510 
truck
35279 
other
22110 
Other values (8)
68783 

Length

Max length11
Median length5
Mean length4.9978534
Min length3

Characters and Unicode

Total characters1669393
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpickup
2nd rowpickup
3rd rowpickup
4th rowpickup
5th rowtruck

Common Values

ValueCountFrequency (%)
sedan 87056
20.4%
SUV 77284
18.1%
pickup 43510
10.2%
truck 35279
 
8.3%
other 22110
 
5.2%
coupe 19204
 
4.5%
hatchback 16598
 
3.9%
wagon 10751
 
2.5%
van 8548
 
2.0%
convertible 7731
 
1.8%
Other values (3) 5951
 
1.4%
(Missing) 92858
21.8%

Length

2025-04-19T20:53:21.039477image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan 87056
26.1%
suv 77284
23.1%
pickup 43510
13.0%
truck 35279
10.6%
other 22110
 
6.6%
coupe 19204
 
5.7%
hatchback 16598
 
5.0%
wagon 10751
 
3.2%
van 8548
 
2.6%
convertible 7731
 
2.3%
Other values (3) 5951
 
1.8%

Most occurring characters

ValueCountFrequency (%)
a 144985
 
8.7%
e 143832
 
8.6%
c 138920
 
8.3%
n 123736
 
7.4%
p 106224
 
6.4%
u 98510
 
5.9%
k 95387
 
5.7%
d 87665
 
5.3%
s 87573
 
5.2%
t 81718
 
4.9%
Other values (15) 560843
33.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1669393
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 144985
 
8.7%
e 143832
 
8.6%
c 138920
 
8.3%
n 123736
 
7.4%
p 106224
 
6.4%
u 98510
 
5.9%
k 95387
 
5.7%
d 87665
 
5.3%
s 87573
 
5.2%
t 81718
 
4.9%
Other values (15) 560843
33.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1669393
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 144985
 
8.7%
e 143832
 
8.6%
c 138920
 
8.3%
n 123736
 
7.4%
p 106224
 
6.4%
u 98510
 
5.9%
k 95387
 
5.7%
d 87665
 
5.3%
s 87573
 
5.2%
t 81718
 
4.9%
Other values (15) 560843
33.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1669393
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 144985
 
8.7%
e 143832
 
8.6%
c 138920
 
8.3%
n 123736
 
7.4%
p 106224
 
6.4%
u 98510
 
5.9%
k 95387
 
5.7%
d 87665
 
5.3%
s 87573
 
5.2%
t 81718
 
4.9%
Other values (15) 560843
33.6%

paint_color
Categorical

Missing 

Distinct12
Distinct (%)< 0.1%
Missing130203
Missing (%)30.5%
Memory size3.3 MiB
white
79285 
black
62861 
silver
42970 
blue
31223 
red
30473 
Other values (7)
49865 

Length

Max length6
Median length5
Mean length4.7906747
Min length3

Characters and Unicode

Total characters1421283
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwhite
2nd rowblue
3rd rowred
4th rowred
5th rowblack

Common Values

ValueCountFrequency (%)
white 79285
18.6%
black 62861
14.7%
silver 42970
 
10.1%
blue 31223
 
7.3%
red 30473
 
7.1%
grey 24416
 
5.7%
green 7343
 
1.7%
custom 6700
 
1.6%
brown 6593
 
1.5%
yellow 2142
 
0.5%
Other values (2) 2671
 
0.6%
(Missing) 130203
30.5%

Length

2025-04-19T20:53:21.096341image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
white 79285
26.7%
black 62861
21.2%
silver 42970
14.5%
blue 31223
 
10.5%
red 30473
 
10.3%
grey 24416
 
8.2%
green 7343
 
2.5%
custom 6700
 
2.3%
brown 6593
 
2.2%
yellow 2142
 
0.7%
Other values (2) 2671
 
0.9%

Most occurring characters

ValueCountFrequency (%)
e 227866
16.0%
l 142025
10.0%
i 122255
 
8.6%
r 114466
 
8.1%
b 100677
 
7.1%
w 88020
 
6.2%
t 85985
 
6.0%
h 79285
 
5.6%
c 69561
 
4.9%
a 64845
 
4.6%
Other values (11) 326298
23.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1421283
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 227866
16.0%
l 142025
10.0%
i 122255
 
8.6%
r 114466
 
8.1%
b 100677
 
7.1%
w 88020
 
6.2%
t 85985
 
6.0%
h 79285
 
5.6%
c 69561
 
4.9%
a 64845
 
4.6%
Other values (11) 326298
23.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1421283
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 227866
16.0%
l 142025
10.0%
i 122255
 
8.6%
r 114466
 
8.1%
b 100677
 
7.1%
w 88020
 
6.2%
t 85985
 
6.0%
h 79285
 
5.6%
c 69561
 
4.9%
a 64845
 
4.6%
Other values (11) 326298
23.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1421283
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 227866
16.0%
l 142025
10.0%
i 122255
 
8.6%
r 114466
 
8.1%
b 100677
 
7.1%
w 88020
 
6.2%
t 85985
 
6.0%
h 79285
 
5.6%
c 69561
 
4.9%
a 64845
 
4.6%
Other values (11) 326298
23.0%

state
Text

Distinct51
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
2025-04-19T20:53:21.180013image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters853760
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowaz
2nd rowar
3rd rowfl
4th rowma
5th rownc
ValueCountFrequency (%)
ca 50614
 
11.9%
fl 28511
 
6.7%
tx 22945
 
5.4%
ny 19386
 
4.5%
oh 17696
 
4.1%
or 17104
 
4.0%
mi 16900
 
4.0%
nc 15277
 
3.6%
wa 13861
 
3.2%
pa 13753
 
3.2%
Other values (41) 210833
49.4%
2025-04-19T20:53:21.310798image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 137111
16.1%
c 91464
10.7%
n 80937
 
9.5%
i 67266
 
7.9%
o 56973
 
6.7%
m 56562
 
6.6%
t 49156
 
5.8%
l 47049
 
5.5%
f 28511
 
3.3%
w 26921
 
3.2%
Other values (14) 211810
24.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 853760
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 137111
16.1%
c 91464
10.7%
n 80937
 
9.5%
i 67266
 
7.9%
o 56973
 
6.7%
m 56562
 
6.6%
t 49156
 
5.8%
l 47049
 
5.5%
f 28511
 
3.3%
w 26921
 
3.2%
Other values (14) 211810
24.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 853760
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 137111
16.1%
c 91464
10.7%
n 80937
 
9.5%
i 67266
 
7.9%
o 56973
 
6.7%
m 56562
 
6.6%
t 49156
 
5.8%
l 47049
 
5.5%
f 28511
 
3.3%
w 26921
 
3.2%
Other values (14) 211810
24.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 853760
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 137111
16.1%
c 91464
10.7%
n 80937
 
9.5%
i 67266
 
7.9%
o 56973
 
6.7%
m 56562
 
6.6%
t 49156
 
5.8%
l 47049
 
5.5%
f 28511
 
3.3%
w 26921
 
3.2%
Other values (14) 211810
24.8%

Interactions

2025-04-19T20:53:16.545754image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:15.791725image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.042959image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.289875image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.607354image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:15.854240image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.100896image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.360180image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.670023image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:15.918221image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.163114image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.425329image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.727969image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:15.983251image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.226202image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-04-19T20:53:16.486776image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-04-19T20:53:21.358196image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
conditioncylindersdrivefuelidmanufacturerodometerpaint_colorpricesizetitle_statustransmissiontypeyear
condition1.0000.0790.0990.1540.0660.0830.0310.0710.0070.0380.1350.3830.1410.118
cylinders0.0791.0000.3860.1980.0240.3380.0210.0740.0000.3150.0400.1560.2440.081
drive0.0990.3861.0000.1620.0060.4590.0140.1200.0000.2250.0370.1130.5480.183
fuel0.1540.1980.1621.0000.0530.3560.0110.0900.0000.1450.0250.2540.2430.080
id0.0660.0240.0060.0531.0000.0860.0450.026-0.0790.0070.0130.0460.050-0.085
manufacturer0.0830.3380.4590.3560.0861.0000.0130.1000.0000.2570.0370.1980.2660.099
odometer0.0310.0210.0140.0110.0450.0131.0000.009-0.4570.0040.0310.0240.008-0.651
paint_color0.0710.0740.1200.0900.0260.1000.0091.0000.0000.0800.0230.1340.0940.085
price0.0070.0000.0000.000-0.0790.000-0.4570.0001.0000.0000.0000.0070.0000.491
size0.0380.3150.2250.1450.0070.2570.0040.0800.0001.0000.0210.1330.3330.045
title_status0.1350.0400.0370.0250.0130.0370.0310.0230.0000.0211.0000.0610.0310.081
transmission0.3830.1560.1130.2540.0460.1980.0240.1340.0070.1330.0611.0000.2840.256
type0.1410.2440.5480.2430.0500.2660.0080.0940.0000.3330.0310.2841.0000.093
year0.1180.0810.1830.080-0.0850.099-0.6510.0850.4910.0450.0810.2560.0931.000

Missing values

2025-04-19T20:53:16.886062image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-19T20:53:17.267421image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-04-19T20:53:18.140826image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idregionpriceyearmanufacturermodelconditioncylindersfuelodometertitle_statustransmissionVINdrivesizetypepaint_colorstate
07222695916prescott6000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNaz
17218891961fayetteville11900NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNar
27221797935florida keys21000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNfl
37222270760worcester / central MA1500NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNma
47210384030greensboro4900NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnc
57222379453hudson valley1600NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNny
67221952215hudson valley1000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNny
77220195662hudson valley15995NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNny
87209064557medford-ashland5000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNor
97219485069erie3000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNpa
idregionpriceyearmanufacturermodelconditioncylindersfuelodometertitle_statustransmissionVINdrivesizetypepaint_colorstate
4268707301592119wyoming229902020.0hyundaisonata se sedan 4dgoodNaNgas3066.0cleanother5NPEG4JAXLH051710fwdNaNsedanbluewy
4268717301591639wyoming179902018.0kiasportage lx sport utility 4dgoodNaNgas34239.0cleanotherKNDPMCAC7J7417329NaNNaNSUVNaNwy
4268727301591201wyoming325902020.0mercedes-benzc-class c 300goodNaNgas19059.0cleanother55SWF8DB6LU325050rwdNaNsedanwhitewy
4268737301591202wyoming309902018.0mercedes-benzglc 300 sportgoodNaNgas15080.0cleanautomaticWDC0G4JB6JV019749rwdNaNotherwhitewy
4268747301591199wyoming335902018.0lexusgs 350 sedan 4dgood6 cylindersgas30814.0cleanautomaticJTHBZ1BLXJA012999rwdNaNsedanwhitewy
4268757301591192wyoming235902019.0nissanmaxima s sedan 4dgood6 cylindersgas32226.0cleanother1N4AA6AV6KC367801fwdNaNsedanNaNwy
4268767301591187wyoming305902020.0volvos60 t5 momentum sedan 4dgoodNaNgas12029.0cleanother7JR102FKXLG042696fwdNaNsedanredwy
4268777301591147wyoming349902020.0cadillacxt4 sport suv 4dgoodNaNdiesel4174.0cleanother1GYFZFR46LF088296NaNNaNhatchbackwhitewy
4268787301591140wyoming289902018.0lexuses 350 sedan 4dgood6 cylindersgas30112.0cleanother58ABK1GG4JU103853fwdNaNsedansilverwy
4268797301591129wyoming305902019.0bmw4 series 430i gran coupegoodNaNgas22716.0cleanotherWBA4J1C58KBM14708rwdNaNcoupeNaNwy